Projecting Dialect Distances to Geography: Bootstrap Clustering vs. Noisy Clustering

نویسندگان

  • John Nerbonne
  • Peter Kleiweg
  • Wilbert Heeringa
  • Franz Manni
چکیده

Dialectometry produces aggregate distance matrices in which a distance is specified for each pair of sites. By projecting groups obtained by clustering onto geography one compares results with traditional dialectology, which produced maps partitioned into implicitly non-overlapping dialect areas. The importance of dialect areas has been challenged by proponents of continua, but they too need to compare their findings to older literature, expressed in terms of areas. Simple clustering is unstable, meaning that small differences in the input matrix can lead to large differences in results (Jain et al. 1999). This is illustrated with a 500site data set from Bulgaria, where input matrices which correlate very highly (r = 0.97) still yield very different clusterings. Kleiweg et al. (2004) introduce composite clustering, in which random noise is added to matrices during repeated clustering. The resulting borders are then projected onto the map. The present contribution compares Kleiweg et al.’s procedure to resampled bootstrapping, and also shows how the same procedure used to project borders from composite clustering may be used to project borders from bootstrapping.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Computational dialectology in Irish Gaelic

Dialect groupings can be discovered objectively and automatically by cluster analysis of phonetic transcriptions such as those found in a linguistic atlas. The first step in the analysis, the computation of linguistic distance.between each pair of sites, can be computed as Levenshtein distance between phonetic strings. This correlates closely with the much more laborious technique of determinin...

متن کامل

Extraction of Folksonomies from Noisy Texts

We built a system for the automatic creation of a text-based topic hierarchy, meant to be used in a geographically defined community. This poses two main problems. First, the appearance of both standard language and a community-related dialect, demanding that dialect words should be as much as possible corrected to standard words, and second, the automatic hierarchic clustering of texts by thei...

متن کامل

A Multi-Objective Approach to Fuzzy Clustering using ITLBO Algorithm

Data clustering is one of the most important areas of research in data mining and knowledge discovery. Recent research in this area has shown that the best clustering results can be achieved using multi-objective methods. In other words, assuming more than one criterion as objective functions for clustering data can measurably increase the quality of clustering. In this study, a model with two ...

متن کامل

Clustering Microarray Data

Using microarray data, which gives thousands of genes’ expression levels at once, we examine different clustering techniques and evaluation methods to effectively cluster the noisy data. The results of this study has implications for the field of biology. Genes with similar functions are grouped together, which gives insight into specific genes and their role in the cell. The cluster analysis e...

متن کامل

Hybrid Bio-Inspired Clustering Algorithm for Energy Efficient Wireless Sensor Networks

In order to achieve the sensing, communication and processing tasks of Wireless Sensor Networks, an energy-efficient routing protocol is required to manage the dissipated energy of the network and to minimalize the traffic and the overhead during the data transmission stages. Clustering is the most common technique to balance energy consumption amongst all sensor nodes throughout the network. I...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007